{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MobileNet-SSD Object Detection Example\n",
    "This example demonstrates the workflow to convert a publicly available TensorFlow model for object detection into CoreML, and verify its numerical correctness against the TensorFlow model.\n",
    "\n",
    "We recommend you go through the MNIST example (linear_mnist_example.ipynb) and Inception V3 example before this one, as they contain important documentation for the workflow.\n",
    "\n",
    "We use a MobileNet + SSD model provided by Google, which can be downloaded at this URL:\n",
    "https://storage.googleapis.com/download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_android_export.zip\n",
    "\n",
    "Please refer to the [TensorFlow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection) for more details.\n",
    "\n",
    "Also, please refer to [here](https://developer.apple.com/documentation/coreml) for detailed documentation of CoreML."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
    "import os, sys, zipfile\n",
    "from os.path import dirname\n",
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "from tensorflow.core.framework import graph_pb2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download the model and class label package\n",
    "mobilenet_ssd_url = 'https://storage.googleapis.com/download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_android_export.zip'\n",
    "example_dir = '/tmp/tfcoreml_ssd_example/'\n",
    "if not os.path.exists(example_dir):\n",
    "    os.makedirs(example_dir)\n",
    "mobilenet_ssd_fpath = example_dir + 'ssd_mobilenet_v1_android_export.zip'\n",
    "if sys.version_info[0] < 3:\n",
    "    import urllib\n",
    "    urllib.urlretrieve(mobilenet_ssd_url, mobilenet_ssd_fpath)\n",
    "else:\n",
    "    import urllib.request\n",
    "    urllib.request.urlretrieve(mobilenet_ssd_url, mobilenet_ssd_fpath)\n",
    "zip_ref = zipfile.ZipFile(mobilenet_ssd_fpath, 'r')\n",
    "zip_ref.extractall(example_dir)\n",
    "zip_ref.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load the TF graph definition\n",
    "tf_model_path = example_dir + 'ssd_mobilenet_v1_android_export.pb'\n",
    "with open(tf_model_path, 'rb') as f:\n",
    "    serialized = f.read()\n",
    "tf.reset_default_graph()\n",
    "original_gdef = tf.GraphDef()\n",
    "original_gdef.ParseFromString(serialized)\n",
    "\n",
    "with tf.Graph().as_default() as g:\n",
    "    tf.import_graph_def(original_gdef, name='')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The full MobileNet-SSD TF model contains 4 subgraphs: *Preprocessor*, *FeatureExtractor*, *MultipleGridAnchorGenerator*, and *Postprocessor*. Here we will extract the *FeatureExtractor* from the model and strip off the other subgraphs, as these subgraphs contain structures not currently supported in CoreML. The tasks in *Preprocessor*, *MultipleGridAnchorGenerator* and *Postprocessor* subgraphs can be achieved by other means, although they are non-trivial.\n",
    "\n",
    "By inspecting TensorFlow GraphDef, it can be found that:\n",
    "(1) the input tensor of MobileNet-SSD Feature Extractor is `Preprocessor/sub:0` of shape `(1,300,300,3)`, which contains the preprocessed image.\n",
    "(2) The output tensors are: `concat:0` of shape `(1,1917,4)`, the box coordinate encoding for each of the 1917 anchor boxes; and `concat_1:0` of shape `(1,1917,91)`, the confidence scores (logits) for each of the 91 object classes (including 1 class for background), for each of the 1917 anchor boxes.\n",
    "So we extract the feature extractor out as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Strip unused subgraphs and save it as another frozen TF model\n",
    "from tensorflow.python.tools import strip_unused_lib\n",
    "from tensorflow.python.framework import dtypes\n",
    "from tensorflow.python.platform import gfile\n",
    "input_node_names = ['Preprocessor/sub']\n",
    "output_node_names = ['concat', 'concat_1']\n",
    "gdef = strip_unused_lib.strip_unused(\n",
    "        input_graph_def = original_gdef,\n",
    "        input_node_names = input_node_names,\n",
    "        output_node_names = output_node_names,\n",
    "        placeholder_type_enum = dtypes.float32.as_datatype_enum)\n",
    "# Save the feature extractor to an output file\n",
    "frozen_model_file = example_dir + 'ssd_mobilenet_feature_extractor.pb'\n",
    "with gfile.GFile(frozen_model_file, \"wb\") as f:\n",
    "    f.write(gdef.SerializeToString())\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now we have a TF model ready to be converted to CoreML\n",
    "import tfcoreml\n",
    "# Supply a dictionary of input tensors' name and shape (with # batch axis)\n",
    "input_tensor_shapes = {\"Preprocessor/sub:0\":[1,300,300,3]} # batch size is 1\n",
    "# Output CoreML model path\n",
    "coreml_model_file = example_dir + 'ssd_mobilenet_feature_extractor.mlmodel'\n",
    "# The TF model's ouput tensor name\n",
    "output_tensor_names = ['concat:0', 'concat_1:0']\n",
    "\n",
    "# Call the converter. This may take a while\n",
    "coreml_model = tfcoreml.convert(\n",
    "        tf_model_path=frozen_model_file,\n",
    "        mlmodel_path=coreml_model_file,\n",
    "        input_name_shape_dict=input_tensor_shapes,\n",
    "        output_feature_names=output_tensor_names)\n",
    "\n",
    "# CoreML saved at location: /tmp/tfcoreml_ssd_example/ssd_mobilenet_feature_extractor.mlmodel"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have converted the model to CoreML, we can test its numerical correctness by comparing it with TensorFlow model. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load an image as input\n",
    "import PIL.Image\n",
    "import requests\n",
    "from io import BytesIO\n",
    "from matplotlib.pyplot import imshow\n",
    "img_url = 'https://upload.wikimedia.org/wikipedia/commons/9/93/Golden_Retriever_Carlos_%2810581910556%29.jpg'\n",
    "response = requests.get(img_url)\n",
    "%matplotlib inline\n",
    "img = PIL.Image.open(BytesIO(response.content))\n",
    "imshow(np.asarray(img))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Preprocess the image - normalize to [-1,1]\n",
    "img = img.resize([300,300], PIL.Image.ANTIALIAS)\n",
    "img_array = np.array(img).astype(np.float32) * 2.0 / 255 - 1\n",
    "batch_img_array = img_array[None,:,:,:]\n",
    "\n",
    "# Evaluate TF\n",
    "tf.reset_default_graph()\n",
    "g = tf.import_graph_def(gdef)\n",
    "\n",
    "tf_input_name = 'Preprocessor/sub:0'\n",
    "# concat:0 are the bounding-box encodings of the 1917 anchor boxes\n",
    "# concat_1:0 are the confidence scores of 91 classes of anchor boxes\n",
    "tf_output_names = ['concat:0', 'concat_1:0']\n",
    "with tf.Session(graph = g) as sess:\n",
    "    image_input_tensor = sess.graph.get_tensor_by_name(\"import/\" + tf_input_name)\n",
    "    tf_output_tensors = [sess.graph.get_tensor_by_name(\"import/\" + output_name)\n",
    "                         for output_name in tf_output_names]\n",
    "    tf_output_values = sess.run(tf_output_tensors, \n",
    "                                feed_dict={image_input_tensor: batch_img_array})\n",
    "    tf_box_encodings, tf_scores = tf_output_values\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we evaluate CoreML model and compare result against TensorFlow model.\n",
    "CoreML uses 5D arrays to represent rank-1 to rank-5 tensors. The 5 axes are in the order of `(S,B,C,H,W)`, where S is sequence length, B is batch size, C is number of channels, H is height and W is width. This data layout is usually different from TensorFlow's default layout, where a rank-4 tensor for convolutional nets usually uses `(B,H,W,C)` layout. To make a comparison, one of the result should be transposed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import coremltools\n",
    "# Input shape should be [1,1,3,300,300]\n",
    "mlmodel_path = example_dir + 'ssd_mobilenet_feature_extractor.mlmodel'\n",
    "img_array_coreml = np.transpose(img_array, (2,0,1))[None,None,:,:,:]\n",
    "mlmodel = coremltools.models.MLModel(mlmodel_path)\n",
    "# Pay attention to '__0'. We change ':0' to '__0' to make sure MLModel's \n",
    "# generated Swift/Obj-C code is semantically correct\n",
    "coreml_input_name = tf_input_name.replace(':', '__').replace('/', '__')\n",
    "coreml_output_names = [output_name.replace(':', '__').replace('/', '__') \n",
    "                       for output_name in tf_output_names]\n",
    "coreml_input = {coreml_input_name: img_array_coreml}\n",
    "\n",
    "# When useCPUOnly == True, Relative error should be around 0.001\n",
    "# When useCPUOnly == False on GPU enabled devices, relative errors \n",
    "# are expected to be larger due to utilization of lower-precision arithmetics\n",
    "\n",
    "coreml_outputs_dict = mlmodel.predict(coreml_input, useCPUOnly=True)\n",
    "coreml_outputs = [coreml_outputs_dict[out_name] for out_name in \n",
    "                  coreml_output_names]\n",
    "coreml_box_encodings, coreml_scores = coreml_outputs\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now we compare the differences of two results\n",
    "def max_relative_error(x,y):\n",
    "    den = np.maximum(x,y)\n",
    "    den = np.maximum(den,1)\n",
    "    rel_err = (np.abs(x-y))/den\n",
    "    return np.max(rel_err)\n",
    "\n",
    "rel_error_box = max_relative_error(coreml_box_encodings.squeeze(), \n",
    "        np.transpose(tf_box_encodings.squeeze(),(1,0)))\n",
    "rel_error_score = max_relative_error(coreml_scores.squeeze(), \n",
    "        np.transpose(tf_scores.squeeze(),(1,0)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Max relative error on box encoding: %f' %(rel_error_box))\n",
    "print('Max relative error on scores: %f' %(rel_error_score))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Up to this point we have converted the MobileNet-SSD feature extractor. The remaining tasks are post-processing tasks including generating anchor boxes, decoding the bounding-boxes, and performing non-maximum suppression. These necessary tasks are not trivial; however, CoreML does not contain out-of-the-box support for these tasks at this time developers should write their own post-processing code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
